Pigeon and human performance in a multi-armed bandit task in response to changes in variable interval schedules.

نویسندگان

  • Deborah Racey
  • Michael E Young
  • Dennis Garlick
  • Jennifer Ngoc-Minh Pham
  • Aaron P Blaisdell
چکیده

The tension between exploitation of the best options and exploration of alternatives is a ubiquitous problem that all organisms face. To examine this trade-off across species, pigeons and people were trained on an eight-armed bandit task in which the options were rewarded on a variable interval (VI) schedule. At regular intervals, each option's VI changed, thus encouraging dynamic increases in exploration in response to these anticipated changes. Both species showed sensitivity to the payoffs that was often well modeled by Luce's (1963) decision rule. For pigeons, exploration of alternative options was driven by experienced changes in the payoff schedules, not the beginning of a new session, even though each session signaled a new schedule. In contrast, people quickly learned to explore in response to signaled changes in the payoffs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bayesian and Approximate Bayesian Modeling of Human Sequential Decision-Making on the Multi-Armed Bandit Problem

In this paper we investigate human exploration/exploitation behavior in a sequential-decision making task. Previous studies have suggested that people are suboptimal at scheduling exploration, and heuristic decision strategies are better predictors of human choices than the optimal model. By incorporating more realistic assumptions about subject’s knowledge and limitations into models of belief...

متن کامل

Active Learning by Learning

Pool-based active learning is an important technique that helps reduce labeling efforts within a pool of unlabeled instances. Currently, most pool-based active learning strategies are constructed based on some human-designed philosophy; that is, they reflect what human beings assume to be “good labeling questions.” However, while such human-designed philosophies can be useful on specific data s...

متن کامل

Human behavior in contextual multi-armed bandit problems

In real-life decision environments people learn from their direct experience with alternative courses of action. Yet they can accelerate their learning by using functional knowledge about the features characterizing the alternatives. We designed a novel contextual multi-armed bandit task where decision makers chose repeatedly between multiple alternatives characterized by two informative featur...

متن کامل

Budgeted Learning, Part I: The Multi-Armed Bandit Case

We introduce and motivate the task of learning under a budget. We focus on a basic problem in this space: selecting the optimal bandit after a period of experimentation in a multi-armed bandit setting, where each experiment is costly, our total costs cannot exceed a fixed pre-specified budget, and there is no reward collection during the learning period. We address the computational complexity ...

متن کامل

Multi-armed Bandit Formulation of the Task Partitioning Problem in Swarm Robotics

Task partitioning is a way of organizing work consisting in the decomposition of a task into smaller sub-tasks that can be tackled separately. Task partitioning can be beneficial in terms of reduction of physical interference, increase of efficiency, higher parallelism, and exploitation of specialization. However, task partitioning also entails costs in terms of coordination efforts and overhea...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Learning & behavior

دوره 39 3  شماره 

صفحات  -

تاریخ انتشار 2011